AITopics

arXiv.org Artificial IntelligenceJun-27-2024

Sparse Regression for Machine Translation

Biçici, Ergun

We use transductive regression techniques to learn mappings between source and target features of given parallel corpora and use these mappings to generate machine translation outputs. We show the effectiveness of $L_1$ regularized regression (\textit{lasso}) to learn the mappings between sparsely observed feature sets versus $L_2$ regularized regression. Proper selection of training instances plays an important role to learn correct feature mappings within limited computational resources and at expected accuracy levels. We introduce \textit{dice} instance selection method for proper selection of training instances, which plays an important role to learn correct feature mappings for improving the source and target coverage of the training set. We show that $L_1$ regularized regression performs better than $L_2$ regularized regression both in regression measurements and in the translation experiments using graph decoding. We present encouraging results when translating from German to English and Spanish to English. We also demonstrate results when the phrase table of a phrase-based decoder is replaced with the mappings we find with the regression model.

moses, phrase table, regression, (15 more...)

2406.19478

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Michigan (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Nakov, Preslav, Ng, Hwee Tou

Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

arXiv.org Artificial IntelligenceSep-27-2021

We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. An important advantage of this framework is that it can cope with derivational morphology, which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay, whose morphology is mostly derivational, into English show significant improvements over rivaling approaches based on five automatic evaluation measures (for 320,000 sentence pairs; 9.5 million English word tokens).

machine translation, proceedings, translation, (15 more...)

2109.13724

Country:

Asia > Singapore (0.05)
Asia > Malaysia (0.05)
Asia > Indonesia (0.05)
Asia > Brunei (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceSep-25-2021

Improved statistical machine translation using monolingual paraphrases

Nakov, Preslav

We propose a novel monolingual sentence paraphrasing method for augmenting the training data for statistical machine translation systems "for free" -- by creating it from data that is already available rather than having to create more aligned data. Starting with a syntactic tree, we recursively generate new sentence variants where noun compounds are paraphrased using suitable prepositions, and vice-versa -- preposition-containing noun phrases are turned into noun compounds. The evaluation shows an improvement equivalent to 33%-50% of that of doubling the amount of training data.

lifting, phrase table, translation, (14 more...)

2109.15119

Country:

Europe > Ukraine > Kyiv Oblast > Chernobyl (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
Oceania > Australia (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Artetxe, Mikel, Labaka, Gorka, Agirre, Eneko

Unsupervised Statistical Machine Translation

arXiv.org Artificial IntelligenceSep-4-2018

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further, yielding, for instance, 14.08 and 26.22 BLEU points in WMT 2014 English-German and English-French, respectively, an improvement of more than 7-10 BLEU points over previous unsupervised systems, and closing the gap with supervised SMT (Moses trained on Europarl) down to 2-5 BLEU points. Our implementation is available at https:// github.com/artetxem/monoses.

artificial intelligence, natural language, translation, (16 more...)

1809.01272

Country:

Oceania > Australia (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Sánchez-Cartagena, Víctor M., Pérez-Ortiz, Juan Antonio, Sánchez-Martínez, Felipe

Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation

Journal of Artificial Intelligence ResearchJan-13-2016

We describe a hybridisation strategy whose objective is to integrate linguistic resources from shallow-transfer rule-based machine translation (RBMT) into phrase-based statistical machine translation (PBSMT). It basically consists of enriching the phrase table of a PBSMT system with bilingual phrase pairs matching transfer rules and dictionary entries from a shallow-transfer RBMT system. This new strategy takes advantage of how the linguistic resources are used by the RBMT system to segment the source-language sentences to be translated, and overcomes the limitations of existing hybrid approaches that treat the RBMT systems as a black box. Experimental results confirm that our approach delivers translations of higher quality than existing ones, and that it is specially useful when the parallel corpus available for training the SMT system is small or when translating out-of-domain texts that are well covered by the RBMT dictionaries. A combination of this approach with a recently proposed unsupervised shallow-transfer rule inference algorithm results in a significantly greater translation quality than that of a baseline PBSMT; in this case, the only hand-crafted resource used are the dictionaries commonly used in RBMT. Moreover, the translation quality achieved by the hybrid system built with automatically inferred rules is similar to that obtained by those built with hand-crafted rules.

corpus, phrase pair, translation, (15 more...)

doi: 10.1613/jair.4761

AI Access Foundation

10974

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Czechia > South Moravian Region > Brno (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
(31 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Cho, Kyunghyun, van Merrienboer, Bart, Gulcehre, Caglar, Bahdanau, Dzmitry, Bougares, Fethi, Schwenk, Holger, Bengio, Yoshua

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

arXiv.org Machine LearningSep-2-2014

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

machine learning, natural language, rnn encoder-decoder, (18 more...)

arXiv.org Machine Learning

1406.1078

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Journal of Artificial Intelligence ResearchMay-30-2012

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

Nakov, P., Ng, H. T.

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real'' training data by a factor of 2--5.

proceedings, sentence pair, translation, (14 more...)

doi: 10.1613/jair.3540

AI Access Foundation

10764

Country:

North America > United States > Oregon > Multnomah County > Portland (0.14)
Europe > Czechia > Prague (0.04)
Asia > Singapore (0.04)
(50 more...)

Genre: Research Report (0.68)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)